智能论文笔记

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

Wenhui Wang , Hangbo Bao , Li Dong , Johan Bjorck , Zhiliang Peng , Qiang Liu , Kriti Aggarwal , Owais Khan Mohammed , Saksham Singhal , Subhojit Som

分类：计算机视觉 | 自然语言处理

2022-08-22

语言，视觉和多模式预审查的大量融合正在出现。在这项工作中，我们介绍了通用多模式基础模型BEIT-3，该模型BEIT-3，该模型在视觉和视觉任务上都实现了最新的转移性能。具体来说，我们从三个方面提出了大融合：骨干架构，预训练任务和模型扩展。我们介绍了多道路变压器进行通用建模，其中模块化体系结构可以实现深融合和模态特定的编码。基于共享的骨干，我们以统一的方式对图像（Imglish），文本（英语）和图像文本对（“平行句子”）进行蒙面的“语言”建模。实验结果表明，BEIT-3在对象检测（COCO），语义分割（ADE20K），图像分类（Imagenet），视觉推理（NLVR2），视觉询问答案（VQAV2），图像字幕上获得最先进的性能（可可）和跨模式检索（Flickr30k，可可）。

translated by 谷歌翻译

Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task

Jian Yang , Shuming Ma , Haoyang Huang , Dongdong Zhang , Li Dong , Shaohan Huang , Alexandre Muzio , Saksham Singhal , Hany Hassan Awadalla , Xia Song

分类：自然语言处理

2021-11-03

本报告介绍了在大型多语种计算机翻译中为WMT21共享任务的Microsoft的机器翻译系统。我们参加了所有三种评估轨道，包括大轨道和两个小轨道，前者是无约束的，后两者完全受约束。我们的模型提交到共享任务的初始化用deltalm \脚注{\ url {https://aka.ms/deltalm}}，一个通用的预训练的多语言编码器 - 解码器模型，并相应地使用巨大的收集并行进行微调数据和允许的数据源根据轨道设置，以及应用逐步学习和迭代背翻译方法进一步提高性能。我们的最终提交在自动评估度量方面排名第一的三条轨道。

translated by 谷歌翻译

Large Language Models Encode Clinical Knowledge

Karan Singhal , Shekoofeh Azizi , Tao Tu , S. Sara Mahdavi , Jason Wei , Hyung Won Chung , Nathan Scales , Ajay Tanwani , Heather Cole-Lewis , Stephen Pfohl

分类：自然语言处理

2022-12-26

Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.

translated by 谷歌翻译

Enhancing Cyber Resilience of Networked Microgrids using Vertical Federated Reinforcement Learning

Sayak Mukherjee , Ramij R. Hossain , Yuan Liu , Wei Du , Veronica Adetola , Sheik M. Mohiuddin , Qiuhua Huang , Tianzhixi Yin , Ankit Singhal

分类：机器学习

2022-12-17

This paper presents a novel federated reinforcement learning (Fed-RL) methodology to enhance the cyber resiliency of networked microgrids. We formulate a resilient reinforcement learning (RL) training setup which (a) generates episodic trajectories injecting adversarial actions at primary control reference signals of the grid forming (GFM) inverters and (b) trains the RL agents (or controllers) to alleviate the impact of the injected adversaries. To circumvent data-sharing issues and concerns for proprietary privacy in multi-party-owned networked grids, we bring in the aspects of federated machine learning and propose a novel Fed-RL algorithm to train the RL agents. To this end, the conventional horizontal Fed-RL approaches using decoupled independent environments fail to capture the coupled dynamics in a networked microgrid, which leads us to propose a multi-agent vertically federated variation of actor-critic algorithms, namely federated soft actor-critic (FedSAC) algorithm. We created a customized simulation setup encapsulating microgrid dynamics in the GridLAB-D/HELICS co-simulation platform compatible with the OpenAI Gym interface for training RL agents. Finally, the proposed methodology is validated with numerical examples of modified IEEE 123-bus benchmark test systems consisting of three coupled microgrids.

translated by 谷歌翻译

Teaching Matters: Investigating the Role of Supervision in Vision Transformers

Matthew Walmer , Saksham Suri , Kamal Gupta , Abhinav Shrivastava

分类：计算机视觉 | 机器学习

2022-12-07

Vision Transformers (ViTs) have gained significant popularity in recent years and have proliferated into many applications. However, it is not well explored how varied their behavior is under different learning paradigms. We compare ViTs trained through different methods of supervision, and show that they learn a diverse range of behaviors in terms of their attention, representations, and downstream performance. We also discover ViT behaviors that are consistent across supervision, including the emergence of Offset Local Attention Heads. These are self-attention heads that attend to a token adjacent to the current token with a fixed directional offset, a phenomenon that to the best of our knowledge has not been highlighted in any prior work. Our analysis shows that ViTs are highly flexible and learn to process local and global information in different orders depending on their training method. We find that contrastive self-supervised methods learn features that are competitive with explicitly supervised features, and they can even be superior for part-level tasks. We also find that the representations of reconstruction-based models show non-trivial similarity to contrastive self-supervised models. Finally, we show how the "best" layer for a given task varies by both supervision method and task, further demonstrating the differing order of information processing in ViTs.

translated by 谷歌翻译

Avoiding spurious correlations via logit correction

Sheng Liu , Xu Zhang , Nitesh Sekhar , Yue Wu , Prateek Singhal , Carlos Fernandez-Granda

分类：机器学习 | 自然语言处理 | 计算机视觉 | (统计)机器学习

2022-12-02

Empirical studies suggest that machine learning models trained with empirical risk minimization (ERM) often rely on attributes that may be spuriously correlated with the class labels. Such models typically lead to poor performance during inference for data lacking such correlations. In this work, we explicitly consider a situation where potential spurious correlations are present in the majority of training data. In contrast with existing approaches, which use the ERM model outputs to detect the samples without spurious correlations, and either heuristically upweighting or upsampling those samples; we propose the logit correction (LC) loss, a simple yet effective improvement on the softmax cross-entropy loss, to correct the sample logit. We demonstrate that minimizing the LC loss is equivalent to maximizing the group-balanced accuracy, so the proposed LC could mitigate the negative impacts of spurious correlations. Our extensive experimental results further reveal that the proposed LC loss outperforms the SoTA solutions on multiple popular benchmarks by a large margin, an average 5.5% absolute improvement, without access to spurious attribute labels. LC is also competitive with oracle methods that make use of the attribute labels. Code is available at https://github.com/shengliu66/LC.

translated by 谷歌翻译

A Contextual Bandit Approach for Learning to Plan in Environments with Probabilistic Goal Configurations

Sohan Rudra , Saksham Goel , Anirban Santara , Claudio Gentile , Laurent Perron , Fei Xia , Vikas Sindhwani , Carolina Parada , Gaurav Aggarwal

分类：机器人 | 机器学习

2022-11-29

Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects (e.g. fruits, glasses, phones, etc.) that frequently change their positions due to human intervention. Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty and learns a model of the likelihood of spotting different objects from each navigable location. The likelihoods are used as rewards in a weighted minimum latency solver to deduce a trajectory for the robot. We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability.

translated by 谷歌翻译

Performance evaluation of deep segmentation models on Landsat-8 imagery

Akshat Bhandari , Sriya Rallabandi , Sanchit Singhal , Aditya Kasliwal , Pratinav Seth

分类：计算机视觉 | 机器学习

2022-11-27

Contrails, short for condensation trails, are line-shaped ice clouds produced by aircraft engine exhaust when they fly through cold and humid air. They generate a greenhouse effect by absorbing or directing back to Earth approximately 33% of emitted outgoing longwave radiation. They account for over half of the climate change resulting from aviation activities. Avoiding contrails and adjusting flight routes could be an inexpensive and effective way to reduce their impact. An accurate, automated, and reliable detection algorithm is required to develop and evaluate contrail avoidance strategies. Advancement in contrail detection has been severely limited due to several factors, primarily due to a lack of quality-labeled data. Recently, proposed a large human-labeled Landsat-8 contrails dataset. Each contrail is carefully labeled with various inputs in various scenes of Landsat-8 satellite imagery. In this work, we benchmark several popular segmentation models with combinations of different loss functions and encoder backbones. This work is the first to apply state-of-the-art segmentation techniques to detect contrails in low-orbit satellite imagery. Our work can also be used as an open benchmark for contrail segmentation and is publicly available.

translated by 谷歌翻译

Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities

Andros Tjandra , Nayan Singhal , David Zhang , Ozlem Kalinli , Abdelrahman Mohamed , Duc Le , Michael L. Seltzer

分类：自然语言处理

2022-11-10

End-to-end multilingual ASR has become more appealing because of several reasons such as simplifying the training and deployment process and positive performance transfer from high-resource to low-resource languages. However, scaling up the number of languages, total hours, and number of unique tokens is not a trivial task. This paper explores large-scale multilingual ASR models on 70 languages. We inspect two architectures: (1) Shared embedding and output and (2) Multiple embedding and output model. In the shared model experiments, we show the importance of tokenization strategy across different languages. Later, we use our optimal tokenization strategy to train multiple embedding and output model to further improve our result. Our multilingual ASR achieves 13.9%-15.6% average WER relative improvement compared to monolingual models. We show that our multilingual ASR generalizes well on an unseen dataset and domain, achieving 9.5% and 7.5% WER on Multilingual Librispeech (MLS) with zero-shot and finetuning, respectively.

translated by 谷歌翻译

FUSION: Fully Unsupervised Test-Time Stain Adaptation via Fused Normalization Statistics

Nilanjan Chattopadhyay , Shiv Gehlot , Nitin Singhal

分类：计算机视觉 | 机器学习

2022-08-30

染色揭示了抽吸物的微结构，同时创建组织病理学幻灯片。染色变异被定义为源和目标之间的色差差异，是由于染色过程中的特征变化引起的，导致分布变化和目标的性能差。染色归一化的目的是将目标的色谱分布与源的色谱分布相匹配。然而，染色归一化会导致潜在的形态变形，从而导致错误的诊断。我们提出了Fusion，这是一种通过在无监督的测试时间方案中调整模型来促进污渍适应的新方法，从而消除了目标末端进行重大标记的必要性。 Fusion通过更改目标的批准统一统计数据，并使用加权因子将其与源统计融合在一起。根据加权因子，该算法减少到两个极端之一。尽管缺乏培训或监督，但融合超过了分类和密集预测（细分）的现有等效算法，如两个公共数据集上的全面实验所证明的那样。

translated by 谷歌翻译

HTML版本